Spectral Mapping Using Artificial Neural Networks for Intra-lingual and Cross-lingual Voice Conversion
نویسنده
چکیده
CERTIFICATE This is to certify that the work contained in this thesis titled Spectral mapping using have not been submitted to any other Institute or University for the award of any degree or diploma. Date Mr. Kishore Prahallad ii ACKNOWLEDGEMENTS I would like to express my deepest appreciation to Kishore Prahallad, my advisor for his guidance, encouragement and support throughout my duration as an MS student at IIIT-H. I thank him for allowing me to explore the challenging world of speech technology and for always finding the time to discuss the difficulties along the way. I express my sincere gratitude to Prof. B.Yegnanarayana and Prof. Alan W Black for their valuable advices and inputs during this research. Centre, IIIT-H for providing an excellent environment for work with ample facilities and academic freedom. I thank Guruprasad Sheshadri for all his critical comments on my thesis draft which made me learn how to write better.ticular, I thank Venkatesh Keri and Veera Raghavendra-for all the support, fruitful discussions and fun times together. I am also thankful to the graduate students of IIIT and MSIT who participated in several subjective tests throughout my work. Needless to mention that without the love and moral support of my father, mother and my sister, this work would not have been possible. Voice conversion is a process of transforming an utterance of a source speaker so that it is perceived as if spoken by a specified target speaker. Applications of voice conversion include secured transmission, speech-to-speech translation and generating voices for virtual characters/avatars. The process of voice conversion involves transforming acoustic cues such as spectral parameters characterizing the vocal tract, fundamental frequency, prosody etc., pertaining to the identity of a speaker. Spectral parameters representing the vocal tract shape are known to contribute more to the speaker identity and hence there have been efforts to find a better spectral mapping between the source and the target speaker. In this dissertation, we propose an Artificial Neural Network (ANN) based spectral mapping and compare its performance against the state-of-the-art Gaussian Mixture Model (GMM) based mapping. We show that the ANN based voice conversion system performs better than that of GMM based voice conversion system. A typical requirement for a voice conversion system is to have both the source and target speakers record a same set of utterances, referred to as parallel data. A mapping function obtained on such parallel data can …
منابع مشابه
A phonetic assessment of cross-language voice conversion
Cross-language voice conversion maps the speech of speaker S1 in language L1 to the voice of speaker S2 using knowledge only of how S2 speaks a different language L2. This mapping is usually performed using speech material from S1 and S2 that has been deemed “equivalent” in either acoustic or phonetic terms. This study investigates the issue of equivalence in more detail, and contrasts the perf...
متن کاملCross - Lingual Voice Conversion
CROSS-LINGUAL VOICE CONVERSION Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conve...
متن کاملVoice Conversion Using Articulatory Features
The aim of voice conversion is to transform an utterance spoken by an arbitrary (source) speaker to that of a specific (target) speaker. Text-to-speech (TTS), speech-to-speech translation, mimicry generation and human-machine interaction systems are among the numerous applications which can be greatly benefited by having a voice conversion module. Generally voice conversion systems require para...
متن کاملA flexible and modular crosslingual voice conversion system
A cross-lingual voice conversion system aims at modifying the timbral structure of recorded sentences from a source speaker, in order to obtain processed sentences which are perceived as the same sentences uttered by a target speaker. This work presents the cross-lingual voice conversion problem as a network of related sub-problems and discuss several techniques for solving each of these sub-pr...
متن کاملFrame alignment method for cross-lingual voice conversion
Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult ...
متن کامل